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The CENTER FOR THE STUDY OF EVALUATION OF INSTRUCTIONAL 
PROGRAMS is engaged in research that will yield new ideas 
and new tools capable of analyzing and evaluating instruc- 
tion. Staff members are creating new ways to evaluate con- 
tent of curricula, methods of teaching and the multiple 
effects of both on students. The CENTER is unique because 
of its access to Southern California’s elementary, second- 
ary and higher schools of diverse socio-economic levels 
and cultural backgrounds. 



COMMENTS ON PROFESSOR ALKIN'S PAPER ENTITLED 
’’EVALUATING THE COST-EFFECTIVENESS OF INSTRUCTIONAL PROGRAMS” 

John Bormuth 

I think Dr. Alkin takes a very interesting, if not actually dar- 
ing, position before this group when in his paper asserts that the 
evaluation of instruction is not really complete unless it includes 
some assessment of the costs involved in instruction and is not sim- 
ply a measure of the behavioral outcomes of that instruction. In 
other words, to serve sufficiently the purpose of making educational 
decisions and educational policies, evaluation must also provide us 
with a description which expresses the benefits of instruction in 
relation to its costs. 

My first reaction to this proposition is to give my hearty 
endorsement to the general idea, for the reason that to reject 
it would be approximately equivalent to saying that the values 
and products achieved through education are somehow exempt from 
competing with all our other values for their fair share of the 
public resources . We must realize that education must compete 
with space projects, war, cosmetics, and dog food for the citizen’s 

dollar . 

If the taxpayer has in the past been unwilling to provide 
education with all the financial support that sane of us thought 
was its due, perhaps his hesitancy is attributable to the natural 



aversion to buying a pig in a poke. Asking the taxpayer to put 
out money year after year in ever-increasing amounts for vaguely 
described products having even more vaguely described costs 
attached to them places a rather heavy strain on his credibility. 
Perhaps we should marvel at the fact that the taxpayer is so gener- 
ous, rather than complain at his seeming parsimony. 

If we really believe that public policy should be based upon 
informed public opinion, if we really believe that people have the 
right to know the effects that any treatment has upon them, and if 
we really believe that people have a right to know how their money 
is being spent, then we must agree with Dr. Alkin that the cost-to- 
benefit ratio of instruction must somehow be assessed. Hence, 

Dr. Alkin is asking us to reject the narrower conception of the 
role of evaluation. He is advocating that evaluation can and must 
play an important part in the formation of a public policy on edu- 
cation. 

I have nothing but applause for Dr. Alkin’ s contention that 
evaluation should play a role in the formation of public policy on 
education. However, I have grave doubts that evaluation is suffi- 
ciently developed to play such a role. My argument is that evalua- 
tion is based upon the observation of student responses to some 
sorts of tasks, which I will henceforth call test items, regardless 
of what their specific foim might happen to be; these items are 



derived by some obscure procedure, and they are then selected for in- 
clusion in a test on the basis of authoritarian judgments of sane 
sort. Hence, a test score necessarily represents what the test- 
writers and judges choose to measure and cannot be interpreted as a 
measure of what the instruction actually taught. I further -Claim 
that until we are able to specify objectively the population of all 
possible test items that can be constructed for a given course of 
instruction, and that until we have developed a set or rules for me 
chanically deriving these questions directly from the instructional 
stimuli themselves, evaluation cannot provide us with information 
which can have sufficient scientific validity to meet the require- 
ments implied by Dr. Alkin’s proposal. 

No one familiar with test -making procedures would seriously 
challenge the statement that the items which go into a test are 
selected solely on the basis of authoritative judgment of their 
relevance and their importance to the instruction. What seems to 
have been overlooked frequently is the fact that this method of 
test item selection makes the information from such tests unaccept- 
able for the serious purpose of making public policy because its 
result is that the test scores tell us only what the test -makers 
want us to know. We have no way of determining what all the other 
things were that could have been taught by the instruction, nor can 
we even be certain that any performance on the items actually stem- 
med from the instruction presumably being examined. Therefore, we 



must regard test items as containing an indeterminate bias . As a 
result, we can not accept test results as reliable data upon which 
to base decisions of public policy. 

It seems necessary for us to conclude, then, that evaluation 
techniques can never perform an important role in the making of 
public policy. Now, is this so? I think not; but before they can 
do so, new techniques must be developed. These techniques must have 
the following characteristics: 

1. They must permit us to enumerate exhaustively the behaviors 
that can be acquired as a result of exposure to a course of instruc- 
tion. The resulting knowledge will allow us to inspect thoroughly 
the effects of a program and to draw a set of items that will enable 
us to examine an unbiased sample of those behaviors. 

2. Our test construction techniques must permit us to derive 
the test items in a mechanical and completely reproducible process. 

3. If taxonomic classifications are to be used in any way to 
describe the items so derived, these taxonomic classes must be de- 
fined in terms of the transformations by which they were derived 
from the instructional stimuli. 

Although the meeting of these requirements may sound like an 
impossible task, it seems that the goals can be realized in a fairly 
adequate way. I suggested one possible solution in a paper I did 
for the Research and Development Center at UCLA a year age. I 



might add that Professor Anderson’s trans forma t ' j . ns the other day seemed 
to be hitting very much at the same sort of solution. 

I began my work with the statement that the knowledge trans- 
mitted by a course of instruction may be regarded as a closed sys- 
tem of statements phrased either in natural language or in some 
other symbolic system governed by syntactic constraints. When the 
instruction can be cast into this form, many of the test questions 
which are ordinarily constructed are expressible as transformations 
of the sentences occurring in the instruction. For example, suppose 
in the instruction we have a sentence of this sort: "High mountains 

tend to exhibit rapid hydraulic erosion." From this statement, by 
a specifiable transformation, we can produce the question, "What 
kind of mountains tend to exhibit rapid erosion?" Or still another 
question, by a slightly different set of transformations, would be, 
"What kind of geologic feature tends to be affected by the destruc- 
tive forces of runoff?" There are a number of other questions that 
could be derived from exactly the same sentence, each of which would 
be derived by a slightly different transformation. These are innumer- 
able in every sense of the word, and they are objectively derived. 

The questions derived in this manner are not just those ordi- 
narily judged to be measuring acquisition of explicitly stated 
facts but also include questions measuring various degrees of gen- 
eralization and transfer. Further, these classes of questions are 



objectively definable within the system of transformations used, 
but in general the questions so derived by the particular set of 
transformations I have been talking about deal with what we ordi- 
narily classify as explicitly stated facts. 

More recently, however, I have begun analyzing the syntactic 
constraints existing between sentences. These analyses seem to 
be leading to an ability to deal with questions commonly judged 
to be testing "knowledge of higher level concepts and more complex 
processes." Indeed, I seem to be getting the intuitively satis- 
fying result that the traditional essay question has a generic 
kinship to the mundane short answer completion question. The two 
types of questions simply represent transformations operating at 
different levels in the syntactic structure of the discourse. 

Many of Anderson’s questions appear to fall within the classes 
derived by these transformations. But some of them also appear to 
represent transformations of an order that differs from any I had 
yet thought about. 

What I am arguing, then, is that we need a theory of test writ- 
ing and that until we have such a theory, the practical use of evalua- 
tion for the formation of public policy does not seem to me to be 
possible. 



